Horn Schunck Optical Flow

Filipe Barnabé - 202109213 \ João Ferreira - 202103619 \ Xavier Verbrugge - 202209118

The following project elaborates on Lucas-Kanade (LK) and Horn-Schunk (HS), two optical flow (OF) techniques, respectively Local and Global.\ These methods were fitted with multi-channel, multi-resolution, iterative refinement features to upgrade their capabilities.

Vanilla Lukas Kanade

In regular optical flow method, we assume the following:

  1. Brightness constancy
  2. Temporal peristence
  3. Spatial coherence

Which gives us the following formula for the Lucas-Kanade

$\begin{equation} A=\left[\begin{array}{cc} I_x\left(q_1\right) & I_y\left(q_1\right) \\ I_x\left(q_2\right) & I_y\left(q_2\right) \\ \vdots & \vdots \\ I_x\left(q_n\right) & I_y\left(q_n\right) \end{array}\right] \quad v=\left[\begin{array}{c} V_x \\ V_y \end{array}\right] \quad b=\left[\begin{array}{c} -I_t\left(q_1\right) \\ -I_t\left(q_2\right) \\ \vdots \\ \vdots \\ -I_t\left(q_n\right) \end{array}\right] \end{equation}$

To find a solution for the optical flow, we apply the least squares principle. In practive we solved these matrices with Cramer's rule

$\begin{equation} \left[\begin{array}{c} V_x \\ V_y \end{array}\right]=\left[\begin{array}{cc} \sum_i I_x\left(q_i\right)^2 & \sum_i I_x\left(q_i\right) I_y\left(q_i\right) \\ \sum_i I_y\left(q_i\right) I_x\left(q_i\right) & \sum_i I_y\left(q_i\right)^2 \end{array}\right]^{-1}\left[\begin{array}{c} -\sum_i I_x\left(q_i\right) I_t\left(q_i\right) \\ -\sum_i I_y\left(q_i\right) I_t\left(q_i\right) \end{array}\right] \end{equation}$

Lukas Kanade with multiple channels

To further improve the Lukas Kanade we can exploit the information that resides within color images. This will provide us with a better estimation of the optical flow.

\begin{equation} \min _{u, v} E_{L K}=\sum_{i=1}^3 G_\sigma *\left[l_{i x}(\mathbf{x}) \cdot u+l_{i y}(\mathbf{x}) \cdot v+l_{i t}(\mathbf{x})\right]^2 \end{equation}

$\begin{equation} \begin{gathered} {\left[\begin{array}{cc} \sum_{i=1}^3 G_\sigma * I_{i x}^2 & \sum_{i=1}^3 G_\sigma * I_{i x} I_{i y} \\ \sum_{i=1}^3 G_\sigma * I_{i x} I_{i y} & \sum_{i=1}^3 G_\sigma * I_{i y}^2 \end{array}\right]\left[\begin{array}{c} u \\ v \end{array}\right]=} \\ =-\left[\begin{array}{c} \sum_{i=1}^3 G_\sigma * I_{i x} I_{i t} \\ \sum_{i=1}^3 G_\sigma * I_{i y} I_{i t} \end{array}\right] . \end{gathered} \end{equation}$

Lukas Kanade with multiple channels, multi resolution and iterative refinement

if the object were to move a larger distance then the traditional optical flow method would work bad since the temporal persistence assumption is violated. This is why, we use gaussian pyramids (coarse-to-fine) method to calculate the optical flow.

A Pyramid is built by using multiple copies of the same image. Each level in the pyramid is 1//4th of the size of the previous level and when stacked together it looks like a pyramid. The lowest level is of the highest resolution and the highest level is of the lowest resolution.

Algorithm used for downsampling:

  1. Convolve the image with the Gaussian kernel, G , with standard deviation of \begin{equation} \sqrt{2 / 4 \tau} \end{equation}
  2. Resize the image using bicubic interpolation.

Algorithm for upsampling

  1. Resize using bicubic interpolation.

We need to use the optical flow to propagate from the lowest resolution to the highest resolution in order to find an accurate estimate of the optical flow. We start this off by calculating the optical flow at the lowest resolution, i.e at the top of the pyramid (Multi Resolution). We then upsample the calculated optical flow and use it to wrap the image of I0 on the second level of the pyramid towards I1. Furthermore, we need the optical flow of current iterations to be propagated to the next iteration and thats why we defined a function which calculates the optical flow in a iterative fashion. It uses the optical flow from the previous iteration to wrap I0 towards I1. It then calculates the optical flow between the "wraped" image and the next image in order to refine the estimation of the optical flow. (Iterative Resolution)

We used three levels in the pyramid and 3 iterations

Horn Shunck

Main formulas implemented on the code of Horn Shunck

$$E=\int\int[(I_x u + I_u v + I_t)^2 + (\alpha)^2(||\triangledown u||^2 + ||\triangledown v||^2)] dxdy$$

$\begin{equation} T_{H S}(I(\mathbf{x}))=\sum_{i=1}^3 \nabla_3 I_i(\mathbf{x}) \cdot \nabla_3 I_i(\mathbf{x})^T \end{equation}$

$\begin{equation} \min _{\mathbf{w}} E_{H S}=\int_l\left(\mathbf{w}^T T_{H S}(I(\mathbf{x})) \mathbf{w}+\lambda|\nabla \mathbf{w}|^2\right) d x d y \end{equation}$

$\begin{equation} \begin{aligned} &\breve{I}_x^2 u+\breve{I}_x \breve{I}_y v+\breve{I}_x \breve{I}_t-\lambda \Delta u=0 \\ &\breve{I}_x \breve{I}_y u+\breve{I}_y^2 v+\breve{I}_y \breve{I}_t-\lambda \Delta v=0 \end{aligned} \end{equation}$ $\begin{equation} \breve{l}_j=I_{1 j}+I_{2 j}+I_{3 j} \end{equation}$

Defenition of the masks

maskX = $\begin{bmatrix} -1 & 1\\ -1 & 1 \end{bmatrix}$

maskY = $\begin{bmatrix} 1 & 1\\ -1 & -1 \end{bmatrix}$

maskT = $\begin{bmatrix} 1 & 1\\ 1 & 1 \end{bmatrix}$

Laplacian Mask

maksHS = $\begin{bmatrix} \frac{1}{12} & \frac{1}{6} & \frac{1}{12}\\[6pt] \frac{1}{6} & -1 & \frac{1}{6}\\[6pt] \frac{1}{12} & \frac{1}{6} & \frac{1}{12} \end{bmatrix}$

Vanilla Horn Schunck

Horn Shunck with multi channels

Horn Shunck with multi channels, multi resolution and iteratice refinement

LK is a local technique and assumes the surrounding pixels belong to the same surface (constant flow on the neighbourhood), moving as one. Also is relatively robust to image noise. Since it "works with a neighbourhood basis", this method can only be used when the distance between frames is small enough for its equation to hold, implying the warping methods used later. Another improvement to this function is the use of Shi-Tomasi corner detection algorithm, speeding the process by looking directly for the positions representative of the best match of Tomasi method, which are far fewer potential matches

HS is a global method that yields a high density of flow vectors. As a consequence, it is very sensitive to noise that propagates the neighbourhood information across large regions with the same intensity. (assumes smoothness in flow over the whole image and tries to minimize distortions in flow) The colored version can be minimized by the Euler-Lagrange eqs. with Neuman boundaries This method also uses a Laplace smooth regularization

These methods were upgraded with the multichannel (colors) components as well as the piramidal multiresolution.\ The dataset used contains images at a time zero and the next frame, time one.

The piramidal (Coarse to fine Estimation) scheme is the following\ The calculation of the OF starts by applying a low pass gaussian filter to the images and a liner interpolation downscaling technique, three times, interchangeablly.\ Next, the optical flow is calculated, using both HS and LK independently, resulting in two flow vectors for each iteration, in which is applied a median filter to uniformize the those vectors, then used to create a bicubic interpolated warped image.\

We need to use the optical flow to propagate from the lowest resolution to the highest resolution in order to find an accurate estimate of the optical flow. We start this off by calculating the optical flow at the lowest resolution, i.e at the top of the pyramid (Multi Resolution). We then upsample the calculated optical flow and use it to wrap the image of I0 on the second level of the pyramid towards I1 (larger resolution). Furthermore, we need the optical flow of current iterations to be propagated to the next iteration and thats why we defined a function which calculates the optical flow in a iterative fashion. It uses the optical flow from the previous iteration to wrap I0 towards I1. It then calculates the optical flow between the "wraped" image and the next image in order to refine the estimation of the optical flow. (Iterative Resolution)

At that point, the OF calculated is summed with the OF of the warped image, obtaining the final optical flow The final OF vectors represent the movement occurred between the images (or frames). For as easier representation, these vectors were "color coded", with the attributed colors representing the magnitude and the orientation of the movement (flow). https://www.frontiersin.org/articles/10.3389/fnins.2016.00176/full\ End Point Error (EPE) is calculated by comparing the estimated optical flow vector with a groundtruth OF vector, as it is defined by the module of the difference between them.\ The EPE calculated in this project is the Estimated OF, as the Groundtruth must be annotated to the source (if it is a video). Since EPE does not distinguish between angular deviation and speed difference, Average Angular Error (AAE) is calculated between the angular difference in the image plane.

The presented results deviate from the "motivation" paper. This may be attributed to the short duration of the project, although, there were achieved some good results representative of the flows occured. The means of comparison were the AAE and EPE, along with their standard deviations. Applied to the same image sequences as the paper. Resulting data from the paper concludes the superiority of the LK method, with both AAE and EPE values being lower. The same was confirmed by the current project

Comparing our Average End-Point-Error statistics to the statistics of the professor

Image LK MC+MR+IR Paper LK HS MC+MR+IR Paper HS
Dimetrodon 1.95 (0.71) 0.392 2.11 (1.52) 0.454
Grove2 3.03 (0.5) 0.308 3.21 (1.71) 0.290
Grove3 3.84 (2.37) 0.988 3.9 (2.73) 0.854
RubberWhale 1.13 (0.49) 0.345 1.25 (1.57) 0.432
Hydrangea 3.65 (1.17) 0.468 3.74 (1.95) 0.495
Urban2 8.32 (8.1) 0.572 8.52 (8.59) 0.330
Urban3 7.22 (4.39) 0.862 7.5 (5.53) 0.800

Comparing our Average Angular Error statistics to the statistics of the professor

Image LK MC+MR+IR Paper LK HS MC+MR+IR Paper HS
Dimetrodon 55 (13) 8.48 (14.95) 46 (26) 10.02 (17.03)
Grove2 37 (6) 4.08 (8.26) 30 (27) 3.9 (7.96)
Grove3 38.4 (23.1) 8.18 (16.39 35 (34) 7.29 (14.91)
Hydrangea 36 (14) 9.68 (18.74) 32 (28) 6.72 (14.09)
RubberWhale 73 (11) 6.87 (18.46) 66 (18) 10.70 (20.08)
Urban2 39 (29) 7.78 (17.39) 9 (58) 5.57 (15.93)
Urban3 27 (15) 5.53 (16.64) 4 (54) 12.98 (27.58)

In the direct comparison of the LK method with Multi-color, Multi-resolution and Iterative Refinement processing, the values obtained provide a better result (by almost an order of magnitude) if analyzing the AAE, unexpectadly, the EPE values are higher than the paper's. The differenciation in values might be due to some inaccuracy in OF vector gathering. These observations are proved true in all scenarios.

As the last task, it was calculated the optical flow in two videos/(image sequences), making a video of the representation of the flow using the same color coding and color space.